Multilingual Alignments by Monolingual String Differences
نویسندگان
چکیده
We propose a method to obtain subsentential alignments from several languages simultaneously. The method handles several languages at once, and avoids the complexity explosion due to the usual pair-bypair processing. It can be used for different units (characters, morphemes, words, chunks). An evaluation of word alignments with a trilingual machine translation corpus has been conducted. A comparison of the results with those obtained by state of the art alignment software is reported.
منابع مشابه
Cross-Lingual Validation of Multilingual Wordnets
Incorporating Wordnet or its monolingual followers in modern NLP-based systems already represents a general trend motivated by numerous reports showing significant improvements in the overall performances of these systems. Multilingual wordnets, such as EuroWordNet or BalkaNet, represent one step further with great promises in the domain of multilingual processing. The paper describes one possi...
متن کاملPhylogenetic Grammar Induction
We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. Our method does not require any translated texts or token-level alignments. Instead, the phylogenetic prior couples languages at a parameter level. Joint induction in the multilingual model substantially outperforms independent learning, with larger gains both from more articu...
متن کاملExtracting Multilingual Topics from Unaligned Comparable Corpora
Topic models have been studied extensively in the context of monolingual corpora. Though there are some attempts to mine topical structure from cross-lingual corpora, they require clues about document alignments. In this paper we present a generative model called JointLDA which uses a bilingual dictionary to mine multilingual topics from an unaligned corpus. Experiments conducted on different d...
متن کاملStudying Luxembourgish Phonetics via Multilingual Forced Alignments
Luxembourgish, a Germanic-Franconian language, is embedded in a multilingual context on the divide between Romance and Germanic cultures and remains one of Europe’s under-described languages. This paper investigates the similarity between Luxembourgish phone segments with German, French and English via forced speech alignment techniques. Making use of monolingual acoustic seed models from these...
متن کاملNLP and IR Approaches to Monolingual and Multilingual Link Detection
This paper considers several important issues for monolingual and multilingual link detection. The experimental results show that nouns, verbs, adjectives and compound nouns are useful to represent news stories; story expansion is helpful; topic segmentation has a little effect; and a translation model is needed to capture the differences between languages.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008